Detecting Novel Discrepancies in Highly Dynamic Information Networks

نویسندگان

  • James Abello
  • Tina Eliassi-Rad
  • Nishchal Devanur
چکیده

We address the problem of detecting characteristic patterns in highly dynamic information networks (e.g., a retweet graph). We introduce a scalable approach based on set-system discrepancy. By implicitly labeling each network-edge with the sequence of times in which its two endpoints connect, we view an entire information network as a set-system. This view allows us to use combinatorial discrepancy as a mechanism to “observe” system behavior at different time scales. We illustrate our approach, called Discrepancy-based Novelty Detector (DND), on a diverse set of networks such as emails, bluetooth connections, and tweets. DND has almost linear runtime complexity in the number of pair-wise connections (i.e., communications between vertices) and linear storage complexity in the number of vertices. Examples of novel discrepancies that it detects are asynchronous connections and disagreements in the firing rates of individuals relative to the network as a whole. Discrepancy-based Novelty Detector (DND) On a set of vertices V , consider as input a collection of time-stamped communication pairs �(x, y), ti� where x and y are elements of V and ti indicates a time-stamp when the edge (x, y) was active. For each edge e = (x, y), let Te,t denote the set of time-stamps (ti ≤ t) in which e is active. So, |Te,t| is the frequency of communications on edge e = (x, y). We denote the collection of active node-pairs up to time t by Et = {e = (x, y) : Te,t is non-empty}. Each edge in Et has a firing rate: fr(e, t) = |Te,t| t and its corresponding firing sequence is fr(e) = �fr(e, t)�. The firing rate of any subset E� of Et is the sum of firing rates of the edges in E� up to time t; and its corresponding firing sequence is denoted by fr(E�) = �fr(E�, t)�. The firing rate of a vertex x (up to a particular time t) is the sum of the firing rates of its incident edges up to that time t. So, a dynamic network is a graph sequence {Gt = (V, Et)} with a corresponding firing sequence �fr(Et)�. We will refer to fr(Et) as the firing rate of Gt. Our overall approach consists of comparing the firing (and acceleration) sequence of an edge or a vertex with the firing (and acceleration) sequence of the graph in which they reside. Each Gt can be seen as the set-system: St = {Te,t : Te,t is a subset of a fixed ground set of time-stamps T}. Therefore, a time-varying graph becomes a special set-system; and we adapt tools from set-systems’ discrepancy theory1 to study aspects of its behavior. For the set-system St = {Te,t : Te,t is a subset of {t0, · · · , t}} associated with a graph Gt in the time-graph sequence {Gt}, and any two-coloring function χ : T −→ {−1, 1}, let χ(Te,t) = � {χ(ti) for ti ∈ Te,t}. This is called the discrepancy of the edge e at time t, with respect to the coloring χ, and abusing notation we denote it by χ(e, t). The χdiscrepancy of St is max{χ(Te,t) for Te,t ∈ St}. The discrepancy of the set-system St is the minimum over all χ of χdiscrepancy(St). It follows from a fundamental result in combinatorial discrepancy that the maximum discrepancy of any of our set-systems is less than or equal to � 2t� ln(2mt�), where mt� is the overall number of active edges up to time t�. Moreover, a random, uniform and independent coloring of {t0, · · · , t} achieves this maximum. This provides us a mechanism to associate to each edge e at time t, a χ-weight in the following manner: χ wgt(e, t) = |(|χ(e, t)|− � 2 ∗ t ∗ ln(2mt))|. The χ-weight of an edge up to time t measures how far is its χ value, χ(e, t), from the corresponding theoretical upper-bound on discrepancy � 2 ∗ t ∗ ln(2mt), which we refer from now on as the sbp(t) . An edge e will be called χnovel if its χ wgt(e, t) deviates substantially from the mean of the weight distribution χ wgt(Et). When the discrepancy χ(e, t) is close to 0, it can be interpreted as an indication that e’s “activity pattern” 1B. Chazelle. The Discrepancy Method. Cambridge University Press, 2001.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach for Detecting Relationships in Social Networks Using Cellular Automata Based Graph Coloring

All the social networks can be modeled as a graph, where each roles as vertex and each relationroles as an edge. The graph can be show as G = [V;E], where V is the set of vertices and E is theset of edges. All social networks can be segmented to K groups, where there are members in eachgroup with same features. In each group each person knows other individuals and is in touch ...

متن کامل

A Novel Ensemble Approach for Anomaly Detection in Wireless Sensor Networks Using Time-overlapped Sliding Windows

One of the most important issues concerning the sensor data in the Wireless Sensor Networks (WSNs) is the unexpected data which are acquired from the sensors. Today, there are numerous approaches for detecting anomalies in the WSNs, most of which are based on machine learning methods. In this research, we present a heuristic method based on the concept of “ensemble of classifiers” of data minin...

متن کامل

Incremental Local Evolutionary Outlier Detection for Dynamic Social Networks

Numerous applications in dynamic social networks, ranging from telecommunications to financial transactions, create evolving datasets. Detecting outliers in such dynamic networks is inherently challenging, because the arbitrary linkage structure with massive information is changing over time. Little research has been done on detecting outliers for dynamic social networks, even then, they repres...

متن کامل

Mining Overlapping Communities in Real-world Networks Based on Extended Modularity Gain

Detecting communities plays a vital role in studying group level patterns of a social network and it can be helpful in developing several recommendation systems such as movie recommendation, book recommendation, friend recommendation and so on. Most of the community detection algorithms can detect disjoint communities only, but in the real time scenario, a node can be a member of more than one ...

متن کامل

پیشگویی پیوند در شبکه های اجتماعی با استفاده از ترکیب دسته بندی کننده ها

Abstract Link prediction in social networks is one of the most important activities in analysis of such networks. The importance of link prediction in social networks is due to its dynamic nature. While members and their relationships (links) in such networks are continuously increasing, links may be missed due to various reasons. By predicting such links, the possibility of extension, compl...

متن کامل

Detecting Bot Networks Based On HTTP And TLS Traffic Analysis

Abstract— Bot networks are a serious threat to cyber security, whose destructive behavior affects network performance directly. Detecting of infected HTTP communications is a big challenge because infected HTTP connections are clearly merged with other types of HTTP traffic. Cybercriminals prefer to use the web as a communication environment to launch application layer attacks and secretly enga...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010